## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'

Assignment 1

In the subject MSB104 econometrics, this year we will hand in an assignment divided into four assignments throughout the semester. The assignments must be written and calculated in the software R. We are group one and the countries that will be representing in our assignment is: Denmark, France, Hungary, Portugal and Slovakia.

In this assignment we downloaded two dataset from Eurostat. The data contains GDP (nama_10r_3gdp) and population (demo_r_pjanaggr3,pi) for countries over the last 20 years, on a NUTS3 level. When we have all the information we need from the dataset, we are going to calculate the GDP per capita and describe the data by using the meta data description from Eurostat.

In the second part of this assignment we will use our data to calculate the population watertight GDP Ginie coefficients for the European NUTS2 (j) level and describe our new data. Then we are going to plot the distribution of Ginie coefficients In the end of the first assignment we will discuss if there are noteworthy outliers.

There are different types of GDP values and the unit is stored in a column named “UNIT”, we have chosen to use values where unit is MIO_EUR. This unit represents the GDP value in million Euros.

We made a new dataset called gdppop. In this dataset we will gather the information we need from the other two datasets and combine them into one dataset. This will do it easier in our ongoing research.

Summary Statistics
Variable N Mean Std. Dev. Min Pctl. 25 Pctl. 75 Max
Year 3465 2010 6.056 2000 2005 2015 2020
GDP 3465 15488.351 22719.905 557 3816.47 18363.95 251623.58
unit 3465
… MIO_EUR 3465 100%
Population 3327 585736.518 477586.457 39583 261285.5 709348.5 2863272

GDP per Capita

To calculate GDP per capita, we took GDP and divided it by the population. And then we have multiplied by one million, so that it is represented correctly in Euros.

Briefly about what we found in the summary. We found the min and the max values. i.e. the smallest and the highest GDP per capita for the country. We also found 1st quartile and 3rd quartile. The first quartile was the observation between the median and the lowest value, and looks at the 25% lowest values from the 75% highest. The median looks at the value that is observed the most times in the middle of the observations. The third quartile is then, naturally enough, the value between the median and the highest value.We have also look at mean which told us what the average observation for all regions was.

And then we further look at GDP, population and GDP per capita in our five countries and what results it gives us when we calculate all the NUTS3 regions together.

GDP_per_Capita
2.24e+04

From the summary we can see that GDP per capita at NUTS3 level for all our countries was 22 438,88 Euro. Since we don’t have any other values to compare with we can’t say to much about it, because it doesn´t give us to much information. We will therefor look at the nuts3 regions for each country.

##     Region               Year           GDP               unit          
##  Length:3327        Min.   :2000   Min.   :   683.7   Length:3327       
##  Class :character   1st Qu.:2005   1st Qu.:  3873.0   Class :character  
##  Mode  :character   Median :2010   Median :  8012.6   Mode  :character  
##                     Mean   :2010   Mean   : 15668.1                     
##                     3rd Qu.:2015   3rd Qu.: 18416.7                     
##                     Max.   :2020   Max.   :251623.6                     
##    Population      gdp_per_capita  
##  Min.   :  39583   Min.   :  2976  
##  1st Qu.: 261286   1st Qu.: 14754  
##  Median : 428922   Median : 22260  
##  Mean   : 585736   Mean   : 22439  
##  3rd Qu.: 709348   3rd Qu.: 26560  
##  Max.   :2863272   Max.   :116235

It wasn’t easy to draw any conclusion from the summary above. We can see that there is a difference in GDP as the lowest value is 683,7 while the highest is 251 623,6. We can also see that the maximum value is high compared to the other values, as the 3rd quartile is only 18 416,7. Population and gdp_ per_capita gave us the same results, with a very low minimum and a high maximum value.

To get a better picture of the different countries and the opportunity to see if there are regions that stand out. We make summaries per country, because then we have the opportunity to exclude regions that stand out in futher assignments. We will also look at a spesific year for all the country to compare results with.

To get the result by country, we made a summary of GDP per capita on each country. By creating such a summary for each country, we can get an overview of whether there are major inequality within the various regions. If we find such deviations, we can choose to remove some of our regions in order not to have large inequalities.

Since Hungary and Slovakia have few nuts3 regions we chose to look at them together.

Denmark

Denmark is a small country in Scandinavia,and they don’t have many NUTS3 regions. Kobenhagen is the capital of Denmark, and as we can see above it stand out together with the surrounding region. we can se that the region around the capital has had a higher growth. Capital cities are often richer than other regions. we can see that in Denmark as well.

To get an overview, we have chosen to look at 2010 for all the regions in Denmark to see which regions are the richest and least wealties.

The whealtiest regions in Denmark

YearRegiongdp_per_capita
2010DK0116.47e+04
2010DK0126.38e+04
2010DK0324.27e+04

The poorest regions in Denmark

YearRegiongdp_per_capita
2010DK0222.93e+04
2010DK0142.97e+04
2010DK0213.15e+04
##       Year         Region          gdp_per_capita 
##  Min.   :2010   Length:11          Min.   :29276  
##  1st Qu.:2010   Class :character   1st Qu.:32793  
##  Median :2010   Mode  :character   Median :36862  
##  Mean   :2010                      Mean   :40570  
##  3rd Qu.:2010                      3rd Qu.:41504  
##  Max.   :2010                      Max.   :64695

The three Whealties regions in Denmark are Kobenhagen (DK011), Kobenhagen area (DK012) and Sydjylland (DK032). The three least wealtiest regions in Denmark are Vest- og Sydjælland (DK022), Bornholm (DK014) and Østsjælland (DK021).

Denmark has 11 nuts3 regions, it is the capital and the surrounding region that stands out among the richest, in the poor countries it is more even, we can see that there is 30,000 Euro between the richest and poorest.

France

##    Population      gdp_per_capita  
##  Min.   :  73851   Min.   :  8292  
##  1st Qu.: 290938   1st Qu.: 21721  
##  Median : 523771   Median : 24361  
##  Mean   : 643505   Mean   : 26286  
##  3rd Qu.: 818596   3rd Qu.: 27871  
##  Max.   :2606234   Max.   :116235

France is a country in Western Europe, they also have colonies in other parts of the world. France has the largest land areas in the EU, which we see in the fact that they have over a hundred nuts3 regions.

Since France has so many nuts3 regions, it is not so easy to distinguish the different regions. We see that here too the capital Paris stands out. When we made a summary, we see that there is a difference in min and max. When we look more closely at the numbers, we can see that this may be due to France having colonies in other countries that are included. These colonies are located in Africa and South America which has a negative effect on France’s overall GDP. For further research, we have chosen to remove these regions from the data set.

The whealtiest regions in France

YearRegiongdp_per_capita
2010FR1059.24e+04
2010FR1018.76e+04
2010FRK264.1e+04 

The poorest regions in France

YearRegiongdp_per_capita
2010FRI221.82e+04
2010FRJ211.88e+04
2010FRF321.93e+04
##       Year         Region          gdp_per_capita 
##  Min.   :2010   Length:96          Min.   :18154  
##  1st Qu.:2010   Class :character   1st Qu.:21951  
##  Median :2010   Mode  :character   Median :24289  
##  Mean   :2010                      Mean   :26566  
##  3rd Qu.:2010                      3rd Qu.:28190  
##  Max.   :2010                      Max.   :92362

France now have 96 nuts3 regions. The three wealtiest regions are Paris (FR101), Hauts-de-seine (FR105) and Rhône (FRK26). Paris og Hauts-de-seine stands out clarly from the other region with a difference of approximately 50,000 Euro GDP pr capita. The three poorest regions Creuse (FRI22), Ariège (FRJ21) and Meuse(FRF32). These regions have quite the same GDP pr capita.

Hungary and Slovakia

Hungary and Slovakia are both countries in Central Europe, here we see that the capitals Bratislava and Budapest both had great growth until 2008. Then Budabest went down a little while Bratislava had a larger increase. the rest of the regions have had an more steady growth.

The whealtiest regions in Hungary and Slovakia

YearRegiongdp_per_capita
2010SK0103.27e+04
2010HU1102.19e+04
2010SK0211.42e+04

The poorest regions in Hungary and Slovakia

YearRegiongdp_per_capita
2010HU3134.39e+03
2010HU3235.38e+03
2010HU3325.72e+03
##       Year         Region          gdp_per_capita 
##  Min.   :2010   Length:28          Min.   : 4395  
##  1st Qu.:2010   Class :character   1st Qu.: 6629  
##  Median :2010   Mode  :character   Median : 8018  
##  Mean   :2010                      Mean   : 9561  
##  3rd Qu.:2010                      3rd Qu.:10106  
##  Max.   :2010                      Max.   :32670

Hungary and Slovakia has 28 regions together. The tre whelties are Bratislava (SK010), Budapest (HU110) and Trnava (SK021). The capitals of both countries have the richest region. The least wheltiest regions are Nògràd(HU313), Szabolcs-Szatmàr-Bereg(HU323). We can see that the three poorest regions belong to Hungary. which means that they have a lower GDP per capita than Slovakia in the poorest regions. We see that there is a difference between min and max in the regions of Hungary.

There is a difference of approximately 25,000 Euro between the whelties and poorest regions in Hungary. Based on these observations, Bratislava is the richest region, since we do not see any of Slovakia’s regions among the lowest, we can assume that Slovakia has a higher GDP per capita than Hungary has for its inhabitants.

Portugal

Portugal is a country in southern Europe. Portugal also has two archipelagos, each representing a different region. We see steady growth in all the regions, all have had a slight decline between 2010-2012, after that there has been an increase. We also see that in Portugal the capital Lisbon stands out as the richest region.

The whealtiest regions in Portugal

Regiongdp_per_capita
PT1702.41e+04
PT1812.14e+04
PT1501.7e+04 

The poorest regions in Portugal

Regiongdp_per_capita
PT11C9.92e+03
PT16J1.05e+04
PT11B1.08e+04
##     Region          gdp_per_capita 
##  Length:25          Min.   : 9919  
##  Class :character   1st Qu.:12449  
##  Mode  :character   Median :14708  
##                     Mean   :14558  
##                     3rd Qu.:15762  
##                     Max.   :24120

Portugal has 25 regions. The wheltiest are Lisboa(PT170), Alentejo Litoral (PT181) and Algarve (PT150). We can se that Lisboa has a litle higher GDP pr capita then the rest of the wheltiest regions. The poorest regions are Tãmega e Sousa (PT11C), Alto Tãmega (PT11B) and Beiras e Serra da Estrela (PT16J). There is approximatly 14,000 Euro in GDP per capita, in difference between the wealties and the poorest regions.

In all the countries, we see that it is the capitals that stand out the most and have the highest GDP per capita among their inhabitants. Paris has managed to have the highest GDP per capita and the poorest regions can be found in Hungary.

Descriptive Statistics regional inequity (Gini Nuts2) and discuss briefly if there are noteworthy outliers

Furthermore, we will use the data to calculate population waterproof GDP Ginie coefficients for our countries at a NUTS2 () level.

A Gini coefficient must be between 0 and 1. If it’s 0, it means that there is little inequality, and if it’s closer to 1, it means that there is a greater degree of inequality between rich and poor. We calculate a Gini coefficient by looking at how much wealth and income there is in a country and then how it is distributed among the population. when we have calculated the gini coefficients, we will also run a test on the data we have to see if we find outliers. Outliers are values that are either very high or very low compared to the other data we have.

Gini for all countries all years

## [1] 0.2846063
Summary Statistics
Variable N Mean Std. Dev. Min Pctl. 25 Pctl. 75 Max
gini_n2 46 0.057 0.046 0 0.03 0.073 0.261

First, we look at all the regions in the selected countries. We have 46 observations. The total Gini for all countries for all years is 0.28. This Gini is for all 5 countries over the last 20 years. We think that it will be a bit “washed away” and will therefore look at each individual country. We can also see that we have ginis that are 0 which means a perfect correlation. We will look at each country to find out why and where there are regions that have ginis that are 0. To look more specifically at the countries, we have chosen to only look at the year 2010.

Yearnuts2gini_n2
2010DK010.114  
2010DK020.0153 
2010DK030.053  
2010DK040.00978
2010DK050      

Outliers in Denmark

nuts2
DK05

In the graph above, you can see that there are two regions that have varied quite a bit over the past 20 years. They still stay below 0.025, which shows that there is little difference between rich and poor in these regions. Another region that stands out is DK01. It is a bit further up the graph than the other regions. Although it is not close to 0, there is a greater difference between rich and poor here than in the other regions.

In Denmark, the Gini coefficients are between 0 and 0,11. Denmark has a region Nordjylland (DK05) which is an outlier. In this region, no data has been recorded in 2010. Denmark has only five NUTS2 regions, which doesn’t gives us much data to work with.

France

Yearnuts2gini_n2
2010FR100.261 
2010FRB00.0624
2010FRC10.0728
2010FRC20.0722
2010FRD10.0717
2010FRD20.0567
2010FRE10.0521
2010FRE20.0318
2010FRF10.0452
2010FRF20.0836
2010FRF30.0373
2010FRG00.052 
2010FRH00.0677
2010FRI10.0813
2010FRI20.0342
2010FRI30.0239
2010FRJ10.0707
2010FRJ20.13  
2010FRK10.0611
2010FRK20.118 
2010FRL00.068 
2010FRM00.0493

Outliers in France

nuts2

In France, there are so many regions that it is difficult to see the regions properly in the graph above. What is shown well is that the vast majority of regions follow each other evenly by being below 0.1. There are still some that stand out and we can see that FR10 is the highest with a gini of 0.3, while FRJ1 and FRK1 fluctuate quite a bit from 2005 to 2020

In France, the Gini coefficients are between 0,02 and 0,26, which shows us that France has a slightly higher gini than Denmark, which means that the inequality is slightly greater in France.France doesn´t have any outliers, after we took away the FRY regions,

Hungary and Slovakia

Yearnuts2gini_n2
2010HU110     
2010HU120     
2010HU210.0686
2010HU220.086 
2010HU230.0299
2010HU310.065 
2010HU320.0764
2010HU330.0515
Yearnuts2gini_n2
2010SK010     
2010SK020.0712
2010SK030.054 
2010SK040.0814

Outliers in Hungary and Slovakia

nuts2
HU11
HU12
SK01

Hungary and Slovakia have large fluctuations in their regions. One of the regions with the most fluctuations is HU22 where we can see that they are down to a gini of 0.075 in 2012, while in 2016 they are up to a gini of approximately 0.12. Hungary and Slovakia are small countries in Eastern Europe and we assume that this is the reason why there are large fluctuations.

In Hungary, the Gini coefficient are between 0 and 0,08. Hungary has two regions which are outliers Budapest (HU11) and Pest (HU12). Pest and Budapest hasn’t had any data for the periode that we are looking into.

In Slovakia,the Gini coefficient are also between 0 and 0,08. Slovakia has on region that is an outliers which are (SK01). It leaves only Slovakia with three regions from which we obtain data.

Portugal

Yearnuts2gini_n2
2010PT110.0849
2010PT150     
2010PT160.0671
2010PT170     
2010PT180.0747
2010PT200     
2010PT300     

Outliers in Portugal

nuts2
PT15
PT17
PT20
PT30

In the graph for Portugal, we can see that the Gini in several of the regions has been declining over the past 20 years. That is to say, the differences between rich and poor have narrowed over the years. PT18 stands out somewhat in that there are strong fluctuations over the years.

In Portugal, the Gini coefficient are between 0 and 0,08. Portugal has four regions, Algarve (PT15) Lisboa(PT17), Regiäo Autònoma dos Acores (PT20) and Regiäo Autònoma da Madeira (PT30). Regiäo Autònoma dos Acores and Regiäo Autònoma da Madeira are both archipelagos belonging to Portugal which may be the reason why they are outliers. Portugal doesn’t have many NUTS 2 regions, and when four of them have no value, there is not much confidence in the result we get.

Discuss briefly if there are noteworthy outliers

To summarize what has been done in assignment 1, we have calculated GDP per capita for all the countries combined and per country we have been given. We saw that when we collected all the countries we got 22,805.13 Euros in GDP per capita. Denmark had a significantly higher GDP per capita than the other countries. Hungary had the lowest with only 8781.69. France has colonies in other continents, we chose to remove these regions, this so that GDP would not be affected by these regions that belong to others continents.

When we look at outliers for our countries, we can not find any outliers in France, but we do find in all the other countries. All the regions that are outliners are regions that have only one province, in Hungary, Slovakia and Portugal (HU11, HU12, SK01, PT15 and PT 17) the outliers are linked to the capitals. Capitals are often large areas, which only have one region. Portugal also has two island groups (PT20 and PT30) that come up as outliers, these are small regions. The last region that has outliners can be found in Denmark (DK05), this is a small region.

Assignment 2

At the second assigment we are looking at growth and inequity. We are going to estimate the effect if regional development on regional inequality, for the year 2010. Then we will disuse the goodness of fit of our estimated model. We will plot the relationship between regional development and regional inequality and the fitted line corresponding to our estimate. We are also going to plot the residuals against the predicted values of our model. There will be a discussion about the classical assumptions OLS in light of our data and plots and other determinants of inequity.

We will also go back on Eurostat´s webpages and download EurostatLinks to an external site. It will be for our subset of countries regional (NUTS2, j) data related to transport infrastructure, education and demographics. We are suppose to select on variable per category that we would like to explore further in there relationship to regional inequality. We will try to estimate a multiple linear regression model with our new variables for 2010 and give a small interpretation of our findings. In the end we will discuss the overall fit of our model and the inference related to our findings.

Data set:

We will start the assignment 2 with getting the data set from Eurostat. We want to look at the amount of people who have higher education in the education data set and how many motorways there is in kilometers when we look in the transport dataset. In the demographic data set we want to look at the life expectancy age.

Growth and inequity

Further in the assignment, we will look at growth and inequity in the countries at Nuts2 level.
Before we go further we want to make new variables for the data set. Moving forward we will make linear models (lm) and a form of regression that is simple. A simple regression model will show us the relationship between two variables (R for everyone s. 265). By using this model we can find the Y value when X = 0.

The gini value goes from 0 to 1. Where 0 is a perfect equality and 1 is unequality. As we can see in the summary there are gini…. When the gini is 0, it is likely to believe there is missing som data for 2010. Further we will use filter to take away the gini´s who are zero.

In this ggplot, we can see how the Gini is distributed per country. We can see that France has a point that stands out from the others, while the other countries have most of their points between 0 and 0.1. What we have to be observant about is that the countries Denmark, Hungary, Portugal and Slovakia don’t have so many observations after we divided the countries into Nuts2 levels, that´s why we choose to look only look at France by itself and the alle the countries when we move forward in the assignment.

Estimate the effecte of regional development on regional inequality GINI for the year 2010

To estimate the the effecte between a regional development and regional inequality we can use the formula:

\[ Regional inequality_i = \beta_1 + \beta_2Regional development_i+u_i\]

This equation tells us what the regional inequality will be when the regional development is = 0 The slope of the curve will show us how much inequality will change for each increase in development.

## `geom_smooth()` using formula 'y ~ x'

In this ggplot we look at the relationship between Gini and GDP per capita in 2010 for all countries. We can see that very few of the points hit the line when we look at all the countries together.

DenmarkFranceHungary_SlovakiaPortugalTotal
gdp_per_capita0.405 0.758 ***0.293 -0.396 0.125 * 
(0.138)(0.081)   (0.299)(0.885)(0.056)  
const.-11806.326 -13096.883 ***4054.608 13294.565 3983.460 **
(5781.918)(2225.178)   (2550.306)(12829.967)(1429.894)  
N4     22        9     3     38       
R20.813 0.814    0.121 0.167 0.122   
Note: *** p < 0.001; ** p < 0.01; * p < 0.05 T statistics in brackets.

When we look at the table above we can see the impact regional development (X) has on regional inequality (Y). This means how much an increase in X will mean for an increase in Y. Portugal is the only country that has a negative impact. We will take a closer look at this country by country. From this and earlier observations we choose to remove Paris from the France2010 dataset.

Discuss the goodness of fit of our estiamted model

When we look at the simple regression that we have done, it is R2 that can help us explain whether the variables have any relationship with each other. R2 tells us how much spread we have in the independent variable. The value in R2 can be between 0 and 1. When R2 is 1 it tells us that the independent variable has all the influence on the dependent variable. If it is 0, the independent variable has no influence on the dependent variable (Forskningsmetode - s.345). We can see that Denmark and France have a high R2 value, Denmark with 0,813 and France with 0,814. Denmark only has 4 observations, which tells us that we cannot completely trust this result. France, on the other hand, has 22 observations, the more observations the better outcome of the result . When we look at all the countries together, we can see that both R2 is only 0.122, we can see that in the context of……

## `geom_smooth()` using formula 'y ~ x'

Plot the residuals against the predicted values of our model

When we have carried out a regression analysis, we get a line that gives us an overview of where the best hits are made. Residuals show the distance to the best fit line. There can be both positive and negative residuals. If the value is above the line it is positive and if it is below the line it is negative.

When we look at all the countries together, we can see that the regression line has both positive and negative hits on the line.

As we said earlier we are now only looking at France because it has 22 observations and are more trustworthy then the others. We are also looking at all countries together. When we look at the graph for all the countries and at the one for France, we can see in both graphs that there are some of France´s regions that stands out.The best is when the points are on the line, but as we see above most of the regions are above or below the line which means there is a large variation between the regions. In the graph, we can also see that France has a much steeper curve than when we look at all the countries together. With a steeper curve, one must have more GDP to increase the Gini.

Discuss the classical assumptions OLS in light of your data and plots

For linear regression there are seven OLS assumptions that are classical. To produce the the best estimates we usually use the first six assumptions.

Assumption 1: The regression model is linear in the coefficients and the error term

When we look at all the countries it is close to linear, but when we look at France who have the most nuts2 regions we can see that

Assumption 2:The error term has a population mean of zero

We can see that there are variations in the X variable both in France and in all the countries combined. which may indicate that we have fulfilled the second requirement for OLS Assumption 3:All independent variables are uncorrelated with the error term

The third assumption is about having a random dataset, even if we have made changes such as removing regions in France, we will still say that we have a random dataset based on the population.

Assumption 4:Observations of the error term are uncorrelated with each other

Assumption 5:The error term has a constant variance (no heteroscedasticity)

When we look at the graph for France the line is not flat at all, which means it is heteroscedasticity.

Assumption 6:The error term is normally distributed (optional)

```{r normal q-q plot qqnorm(ols) qqline(ols)

#ols won´t work `` When we look at the normal Q-Q plot it doesn´t appear to show normal values.

https://www.datasciencecentral.com/7-classical-assumptions-of-ordinary-least-squares-ols-linear/

Other determinants of inequity

We have chosen to look at km, education and life expectancy. We want to see what the new variables have to say on the Gini.

Estimate a multiple linear regression model with you new variables for 2010 and give a small interpretation of your findings.

In a multiple linear regression model, we use several variables to see the effect of an increase in gini (Y). We can use the formula:

\[ Y_i = \beta_0 + \beta_1X_{1i} + \beta_2X_{2i}+... + \beta_kX_{ki}+u_i \]

Model 1Model 2Model 3Model 4Model 5Model 6
gdp_per_capita0.613 0.495 -1.0280.416 0.7120.501
(0.520)(0.205)(NaN)    (0.107)(NaN)    (NaN)    
Education544.974      -5620.249     773.104    
(1279.489)     (NaN)         (NaN)        
Motorway     20.253 160.693         19.432
     (29.146)(NaN)             (NaN)    
Lifeexp              -2527.220 -2821.860-2480.632
              (1652.517)(NaN)    (NaN)    
const.-34060.858 -20252.333 150689.366187990.794 179714.043176204.288
(52787.904)(13885.965)(NaN)    (130721.394)(NaN)    (NaN)    
N4     4     4    4     4    4    
R20.841 0.874 1.0000.944 1.0001.000
Note: *** p < 0.001; ** p < 0.01; * p < 0.05 T statistics in brackets.

In Denmark, we can see that models 1, 2 and 4 all have a negative effect when it comes to the variables being dependent on each other. in model 5, where we look at education and life expectancy, we can see a positive effect. the same in model 6 with motorway and life expectancy. when we look at R2, this is high on model 3. but since we only have 4 observations from Denmark, it is not possible to draw any conclusion as to whether these variables are relevant to each other

Model 1Model 2Model 3Model 4Model 5Model 6
gdp_per_capita0.610 *0.514 0.460 0.502 0.487 0.344 
(0.245) (0.291)(0.297)(0.268)(0.273)(0.314)
Education-99.006       -92.916      -64.507      
(95.904)      (96.604)     (101.650)     
Motorway      2.004 1.867           1.942 
      (2.045)(2.055)          (2.008)
Lifeexp                838.106 693.520 823.480 
                (632.397)(682.321)(633.720)
const.-6263.825  -7875.226 -3535.765 -74859.196 -60697.052 -70598.569 
(7685.735) (6935.886)(8286.314)(48460.099)(54101.928)(48747.192)
N21      21     21     21     21     21     
R20.344  0.340 0.374 0.367 0.381 0.400 
Note: *** p < 0.001; ** p < 0.01; * p < 0.05 T statistics in brackets.

France is the country with the most observations. Here we can see that both models 2,4 and 6 have positive effects on the Gini. But all R2 are quite low so we cannot say with certainty that these variables have any effect on the Gini

Model 1Model 2Model 3Model 4Model 5Model 6
gdp_per_capita0.432 0.209 0.889 0.399 0.463 0.403 
(0.551)(0.308)(0.519)(0.700)(0.807)(0.700)
Education56.488      317.996      51.957      
(182.446)     (205.217)     (214.170)     
Motorway     -8.937 -19.707           -9.369 
     (8.544)(10.367)          (9.368)
Lifeexp               -319.675 -128.476 -596.659 
               (1873.970)(2187.748)(1894.324)
const.1919.114 6237.527 -3153.585 27039.266 11327.858 49242.688 
(7418.924)(3282.168)(6742.626)(134766.511)(160421.754)(136582.962)
N9     9     9     9     9     9     
R20.134 0.256 0.497 0.125 0.135 0.271 
Note: *** p < 0.001; ** p < 0.01; * p < 0.05 T statistics in brackets.

Again we have chosen to merge Hungary and Slovakia when we are going to do this multiple regression analysis. here we can see that models 4 and 5 have a positive effect on Gini. Model 5, which has education and life expectancy as variables, both values are positive but they have a rather low R2, which means that they do not have such a strong correlation anyway.

Model 1Model 2Model 3
gdp_per_capita6.898-3.7236.898
(NaN)    (NaN)    (NaN)    
Education3978.251    3978.251
(NaN)        (NaN)    
Lifeexp    -4768.800    
    (NaN)        
const.-379852.307443293.670-379852.307
(NaN)    (NaN)    (NaN)    
N3    3    3    
R21.0001.0001.000
Note: *** p < 0.001; ** p < 0.01; * p < 0.05 T statistics in brackets.

Portugal had no observations on the Motorway, we have therefore removed this variable from these observations. we are then left with three models. and we can see that all the models are positive, model 1 has a higher R2 than model 2 has. while model 3 has a perfect 1. but since we only have 3 observations we cannot put much faith in this R2.

Discuss the overall fit of your model

France is the only country with many observations in 2010, and it will probably give us the best basis for looking at whether the model fits our model or not. We want to find out whether education and the number of km have an effect on inequalities in France and whether these variables have a connection to how wealthy the various regions are.

Assignment 3

BNP2010 %>%
  filter(Year==2010 & nuts0=="DK") %>%
ggplot(aes(x =edu, y=Gini, fill=id_nuts2, color=id_nuts2)) +
  geom_point(lwd = .8) +
   labs(x = "Education", y = "Gini")

lm(Gini2 ~ Edu, data=France2010)%>% 
  tidy%>%
  kable(., digits=2)



Lecture

```{r
reg <- lm(gdp_n2 ~ lea + gdp_per_capita + Edu, data = gdppop2010)
coeftest(reg,  )
reg2 <- lm(log(gdp_n2) ~ log(lea) + log(gdp_per_capita) + log(Edu), data = gdppop2010)
coefficients(reg,vcov=hccm )


#### Appendix

Plots the residuals against the predicted values of our model

geom_smooth() using formula ‘y ~ x’


<img src="Assignment-MSB104_files/figure-html/unnamed-chunk-26-1.png" width="672" />


geom_smooth() using formula ‘y ~ x’


<img src="Assignment-MSB104_files/figure-html/unnamed-chunk-27-1.png" width="672" />

geom_smooth() using formula ‘y ~ x’

```